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Assessment of circulating CD4 count change over time in HIV- 
infected subjects on antiretroviral therapy (ART) is a central compo- 
nent of disease monitoring. The increasing number of HIV-infected 
subjects starting therapy and the limited capacity to support CD4 
count testing within resource-limited settings have fueled interest in 
identifying correlates of CD4 count change such as total lympho- 
cyte count, among others. The application of modeling techniques 
will be essential to this endeavor due to the typically nonlinear CD4 
trajectory over time and the multiple input variables necessary for 
capturing CD4 variability. We propose a prediction-based classifica- 
tion approach that involves first stage modeling and subsequent clas- 
sification based on clinically meaningful thresholds. This approach 
draws on existing analytical methods described in the receiver op- 
erating characteristic curve literature while presenting an extension 
for handling a continuous outcome. Application of this method to an 
independent test sample results in greater than 98% positive predic- 
tive value for CD4 count change. The prediction algorithm is derived 
based on a cohort of n = 270 HIV-1 infected individuals from the 
Royal Free Hospital, London who were followed for up to three years 
from initiation of ART. A test sample comprised of n = 72 individ- 
uals from Philadelphia and followed for a similar length of time is 
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used for validation. Results suggest that this approach may be a use- 
ful tool for prioritizing limited laboratory resources for CD4 testing 
after subjects start antiretroviral therapy. 

1. Introduction. Chronic HIV infection results in the progressive deple- 
tion of CD4+ T lymphocytes from both lymphoid tissues and peripheral 
blood. Thus, the monitoring of peripheral blood CD4 count is the stan- 
dard used in decision-making concerning initiation of antiretroviral therapy 
(ART), as well as monitoring response to ART over time. In 2002 and again 
in 2006, the World Health Organization (WHO) proposed guidelines for ad- 
ministration of ARTs in an effort to provide a clear public health approach to 
utilization of these limited, yet very powerful drugs [WHO-Report (2006)] . 
This series of recommendations includes routine collection and monitoring 
of CD4 counts to inform decisions regarding both initiation and switching 
of drug regimens. However, this report also acknowledges that collection of 
repeated CD4 counts may not be feasible in resource-limited settings due to 
the high costs associated with such monitoring. In these instances, clinicians 
are advised to initiate therapy in patients with asymptomatic HIV disease 
if total lymphocyte count (TLC) falls below 1200 cells/mm 3 . 

In this manuscript we consider modeling strategies for using alternative 
surrogate markers within an acute window (3 years) post-initiation of ther- 
apy. Since publication of the WHO guidelines, several reports have been 
published on the clinical utility of alternative surrogate markers for moni- 
toring post-therapy response and specifically the correlation between these 
markers and CD4 count [Bagchi et al. (2007); Bisson et al. (2006); Fer- 
ris et al. (2004); Mahajan et al. (2004); Kamya et al. (2004); Badri and 
Wood (2003); Bedell et al. (2003); Spacek et al. (2003); Kumarasamy et al. 

(2002) ]. These investigations involve both cross-sectional and longitudinal 
data and implement a variety of straightforward analytical methods. Typ- 
ically, cross-sectional comparisons between CD4 count and TLC as well as 
longitudinal comparisons between the change in each of these variables over 
a specified time period are performed using correlation analysis [Badri and 
Wood (2003); Kamya et al. (2004); Kumarasamy et al. (2002); Spacek et al. 

(2003) ]. A summary of analytic strategies described for these settings, and 
their potential limitations, is given in the discussion; notably, the scientific 
findings of these reports are variable. 

In this manuscript we describe a prediction-based classification (PBC) 
framework for predicting biomarker trajectories based on a binary decision 
rule. PBC was originally described in the setting of classifying HIV ge- 
netic variants that capture variability in a cross-sectional response to ART 
[Foulkes and DeGruttola (2002, 2003)]. Within this framework, we present 
two estimation procedures that both involve first stage modeling using a gen- 
eralized linear mixed effect model (GLMM). In the first case, we dichotomize 
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the biomarker a priori and use a logit link function. In this case, our ap- 
proach reduces simply to fitting a logistic model coupled with a receiver 
operator characteristic (ROC) curve analysis, which is commonly applied in 
practice though it has not been described for this setting. The second esti- 
mation approach we present is based on fitting a linear mixed effects model 
to the observed CD4 count, as measured on a continuous scale. This later 
approach may offer improved predictive performance since it incorporates 
the full range of the continuous scale data. We describe both approaches 
further in Section 2. Section 3 then illustrates the method through appli- 
cation to two cohorts of HIV-1 infected individuals followed for three years 
after initiation of ART. Some simple extensions are described in Section 4 
and finally we offer a discussion of how the approaches complement existing 
methods in Section 5. 

2. Methods. Monitoring patient level CD4 counts over time may involve 
consideration of the observed counts at a given time point, the percent 
change in counts across a given period of time or some other function of pa- 
tient level data. In general, interest lies in determining whether this function 
of the data is above or below a threshold value. For example, in monitor- 
ing absolute CD4 counts, thresholds of 200 and 350 are considered within 
well-established treatment administration guidelines. A threshold of 20%, 
on the other hand, is common for monitoring the percent change in CD4 
between visits over time. We begin in this section by describing a general 
modeling framework. We then present an approach for predicting whether 
absolute CD4 is above a clinically meaningful threshold, at each of multiple 
discrete time points. In Section 4 we consider extensions of this framework 
that allow us to consider functions of the biomarker under study, such as 
percentage change over a given time period. 

2.1. Generalized linear mixed effects model. Consider the generalized lin- 
ear mixed effects model (GLMM) given by 

(2.1) g(E[Y i ])=X l (3 + Z i b i , 

where Yj = (Yn, . . . , Yi ni ) T is a vector of the ni responses for individual i, 
g(-) is a link function, Xj is the rii x M corresponding design matrix across 

M covariates, (3 is the fixed-effects parameter vector and 6j MVN(0,D). 
Here Z, is the design matrix for the random effects and will typically include 
both an intercept and time component. One choice of X,; and Zj is offered in 
the example of Section 3 and includes time varying values of white blood cell 
count and lymphocyte percentage. This model is a natural choice for this 
setting since repeated measures are taken over time on the same individual 
and the time points are unevenly spaced across individuals [Fitzmaurice, 
Laird and Ware (2004)]. 
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In this manuscript we consider two approaches to fitting the model of 
equation (2.1). Since ultimately we are interested in predicting whether 
CD4 count is above (or below) a given threshold, we begin by modeling 
a dichotomized version of the observed CD4 data. We use the notation KJ 
to indicate this binary representation of the observed data. That is, we de- 
fine the dependent variable = J(CD4jj > K), where CD4jj is the CD4 
count at the jth time point for individual i and K is set equal to a clinically 
meaningful threshold. In this case, the canonical logit link is used to model 
the resulting binary outcome. Formally, if we let 9ij = = Pr(Y^~ = 1), 

then equation (2.1) reduces in this setting to 

(2 2) 9i = ex PlxijP + Zijbi] 

lj l + exp[xy/3 + Zy&;]' 

where Xjj and Zj,- are the rows of Xj and Zj respectively, corresponding to 
the jth measurement for individual i. 

Second, we explore the utility of using the full range of the CD4 count 
data by modeling CD4 as a continuous variable. That is, we let Yij = CD4jj 
and g(-) be the identity function, so that the model of equation (2.1) reduces 
to the linear mixed effects model (LMM), given by 

(2.3) Y^ = Xjj/3 -|- Z{jbi -\- £iji 

where ~ -W(0, a 2 ) and bi _L e^. Since we ultimately aim to predict whether 
CD4 is above a given threshold, we then derive a prediction rule based on 
the estimated mean and variance components from this model. 



2.2. Prediction-based classification. In fitting the mixed effects model 
of equation (2.1), we use the complete vector of observed data, given by 
Yi = (ViOi ■ ■ ■ ,1/ini), for all individuals in our learning sample. In general, we 
want to make predictions for new individuals under the assumption that only 
baseline values of yi, given by y^, are observed. In the usual model fitting 
context, the predicted y is generated using the empirical Bayes estimates 
of bi, given by bi = E\b\yi\. Notably, this conditions on this complete data 
vector and thus is not applicable to our setting, in which only the yio are 
available. Thus, we need to arrive at an alternative estimate of the random 
effects that conditions only on the observed data for new individuals. We 
consider two approaches in the context of the linear mixed model. In the 
first case, we replace with Xj/3 in the formula for bi. This is our primary 
approach, described in Section 2.2.2 and applied in the example of Section 3. 
The second alternative we consider is to replace with the baseline measure 
j/iOj which is presented as an extension in Section 4. 
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2.2.1. Binary outcome. After fitting the model of equation (2.1), mean 
and variance parameter estimates can be used to arrive at a predicted mean 
response for individual i at the jth time point. Consider first the case in 
which we dichotomize CD4 count and fit the GLMM with a logit link, as 
described by equation (2.2). In this case, we have the predicted probability of 
CD4 count being above the threshold K at the jth time point for individual 
i given by 



(2.4) 



1 + 



where /3 is a maximum likelihood estimate of /3 and bi = E[bi\yf] is the 
conditional mean of the random effects for individual i, given the observed 
data yf. Numerical integration techniques, such as Gaussian quadrature, 
are required for model fitting in this setting since no simple, closed-form 
solutions to maximum likelihood estimation are available. 

A simple approach to prediction in this case is to let the predicted out- 
come, given by y^, equal 1 if 0y > 0.50 and otherwise, where 6^ is defined 
by equation (2.4). Alternatively, we may want to choose a prediction rule 
that controls a clinically meaningful attribute. For example, in the CD4 pre- 
diction setting, we may want to control the false positive rate, defined as 
the proportion of individuals predicted to be above a safety threshold, when 
in fact their CD4 counts are below this safe limit. In this case, we define 
multiple rules, termed a-prediction rules, that are given by 

jl, if%>l-a, 



1 0, otherwise, 

where the unobserved 9ij is replaced with the estimate 9ij. Notably, in mak- 
ing predictions for new individuals, the complete vector y + is not available 
and, thus, 6j = E[bi\y~?~] in equation (2.4) cannot be calculated. In the ex- 
ample provided below, we let bi = E[bi] = for all i in our test sample. An 
alternative approach for the linear model setting is described in Section 4. 

Table 1 

Contingency table notation for a given 
a-prediction rule 
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Based on a given a-prediction rule, we can generate the contingency ta- 
ble given in Table 1. Here the n^'s are the corresponding cell counts for 
k,l = 1,2. For example, ran is the number of observations that are ob- 
served to be above the threshold {yf- = 1) and predicted to be above the 
threshold (yj a = 1). The sensitivity of this rule is defined as the proba- 
bility of correctly predicting an observation as being above the threshold 
among those responses that are in fact above the threshold and is given 
algebraically as Pi(yfj a = Mvtj = -0 = n u/ n -i- The corresponding speci- 
ficity is given by Pr(y^ a = 0|yj = 0) = ^22/^-2 and the false positive rate 
is FP a = 1 — specificity = n^/n^- Positive predictive value (PPV) and neg- 
ative predictive value (NPV) are given by (nn/rai.) and ^22/^2-), respec- 
tively. By varying the value of a in equation (2.5), we generate multiple 
prediction rules and can construct a corresponding receiver operator char- 
acteristic (ROC) curve, which offers a visual representation of the trade-off 
between sensitivity and specificity. Specifically, an ROC curve is defined as a 
plot of the false positive rate (x-axis) and corresponding sensitivity (y-axis) 
for each of multiple classifiers, in our case prediction rules. In our setting, 
each a-rule contributes one point to the ROC curve. We define the opti- 
mal rule as the one that controls the FP rate at a specified level, though 
alternative criterion are equally applicable. 

Since the prediction rule given by equation (2.5) depends on an esti- 
mate of Oij that is derived based on the data, a cross-validation approach is 
necessary to obtain accurate estimates of predictive performance, including 
sensitivity and false positive rate. The motivation for this stems from the 
need to characterize the ability to make predictions on observations that did 
not contribute to the model fitting procedure. In this manuscript, we use 
an independent test sample to evaluate model performance. The approach 
proceeds as follows: First, model parameters are estimated using data aris- 
ing from what we refer to as the learning sample. Second, the best a-rule 
is identified based on the trade-off between sensitivity and specificity, again 
using the learning sample data. The estimates of predictive performance 
(e.g., false positive rate) based on the learning sample are referred to as 
resubstitution estimates as the data used for estimating error rates are the 
same as those used for deriving the prediction rule. Finally, measures of pre- 
dictive performance for the chosen a-rule are reported based on applying 
the rule to an independent data set, which we refer to as the test sample 
data. These test sample estimates are considered unbiased reflections of pre- 
dictive performance, as independent data sets are used to generate the rule 
and describe its performance. 

2.2.2. Continuous outcome. The prediction approach just described for 
a binary outcome involves simply fitting a logistic regression model and then 
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generating an ROC curve based on several probability cutoffs. While, to our 
knowledge, this has not been applied to the setting of modeling biomarker 
trajectories over time and specifically to CD4 monitoring, similar approaches 
are used in practice in other settings [Tosteson et al. (1994); Tosteson and 
Begg (1988)]. One reason that this approach may not be optimal for the 
present setting is that CD4 count is measured on a continuous scale. We thus 
consider a simple extension of this approach that takes into consideration 
the full range of the observed CD4 count data. We begin by modeling y^ = 
CD4jj as a quantitative biomarker, using the linear mixed effects model of 
equation (2.3), and then derive a prediction approach similar to the one 
described by equation (2.5). 

The model derived predicted value of is given by y?- = Xjj/3 + z^frj. 
Here x^- and Zy are again respectively the rows of Xj and Zj corresponding 
to the jth measurement for individual i, j3 = ^^ 1 (X^S~ 1 Xj)~ 1 X^S,jyj is 
the least squares estimate of (3, bi = E(bi\yi) = DZ^S7" 1 (y i — Xj/3) is the 
best linear unbiased predictor (BLUP) of the random effects for individual 
i, £» = Var(vi) = Z^DZf + a 2 I, and D and a 2 are the restricted maxi- 
mum likelihood estimates of D and a 2 , respectively. Rather than estimate 
Oij = Pr(CD4jj > K) of equation (2.5), we describe a one-sided prediction 
interval approach to identify a rule that is similar to the one described by 
this equation. 

First note that the lower bound of the one-sided (1 — a) prediction interval 
for is given by 

(2.6) lij jCt = yij - z a ^\ai(yij - y^), 

where z a is the quantile of a standard normal corresponding to a 1 — a 
probability and Y&r(yij — yij) is referred to as the prediction variance. In this 
manuscript, we treat this interval as an approximate credible interval, so that 
we are (1 — a)% certain that the random variable Yij will be greater than this 
realization of the lower bound. In other words, Pr(Yij > Uj )a ) = (1 — a)%. 
Thus, if lij, a > K, we are at least (1 — a)% certain that Yij > K. In other 
words, lij^ a > K is equivalent to 6 > (1 — a). As a result, the rule given by 

(2 7) 57+ = J ^' ^kj,a>K, 

v ' Ul ^ a \0, otherwise, 

is equivalent to the one given by equation (2.5). As described in McClean, 
Sanders and Stroup (1991) and McCulloch and Searle (2001), the predic- 
tion variance is given by Var(^jj — Xjj/3 — Zjj6j) = Xjj Var (/3)x^- + Zjj Var(6, — 
bi) Z fj + Kg Cov(/3, b t - bi)zT. where Var(^) = ££ 1 (X i Er 1 Xf)- 1 , Var(6, - 
bi) = (jzZfZi + D" 1 )- 1 - Cov0,bi - bi)Xf Sr^iD and Cov0,b t - h) = 
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our setting, we are interested in the prediction vari- 
ance for a new observed value and thus have an additional a 2 term. That 
is, Var(yjj — y^) of equation (2.6) is equal to Var(yij — Xjj/3 — Zybi) + a 2 . 
The appropriateness of treating the above prediction interval as a credible 
interval depends on prior assumptions about the parameters of our model. 
Since we are using this as a means of generating a prediction rule, and not as 
a tool for inference, this approximation seems reasonable. It also performs 
well in the example provided in Section 3. A study of the relative advan- 
tages of applying a fully Bayesian approach to approximating the posterior 
predictive distribution for this data setting is ongoing research. 

Again a test sample is used to characterize model performance. In the 
linear mixed modeling setting, we note that Var(/3), D and a are estimated 
based on the model fitting procedure that uses the learning sample data. 
The remaining variance terms, Var(6j — hi) and Cov(/3, bi — bi) as well as the 
design elements and z-ij used in the calculation of lij^ a of equation (2.6) 
are based on the test sample data. Notably, in both modeling frameworks, 
the BLUPs of the random effects can not be calculated for a new individual 
for whom the response is not observed. One approach to handling this 
unobserved data is to replace yj in the formula for bi with Xj/3 so that 
bi = DZ^S^ 1 (Xj/? — Xj^) = 0. This results in reducing y^ to yij = Xij/3 
and is consistent with assigning each individual the estimated population 
average. In the example below, we use the prediction variance from the 
usual regression setting of Var(yjj — yij) = x^- Var(/3)x^- + a 2 . This prediction 
variance is less than the one described above; however, as we are varying z a 
of equation (2.6) to generate a series of classification rules, the magnitude of 
the interval is less relevant. An alternative approach for handling the random 
effects in the linear mixed modeling framework is described in Section 4. 

3. Example. The approach described in Section 2 is applied to a cohort 
of N = 270 individuals from the Royal Free Hospital, London who were fol- 
lowed for up to three years after initiation of ART. Detailed information on 
the patient population and laboratory methods can be found in Smith et al. 
(2003, 2004). The aim of our analysis is to determine the utility of baseline 
CD4 count and repeated measures on WBC and lymphocyte percentage for 
predicting CD4 counts over time. Our approach uses the complete CD4 count 
data (across all time points) from a learning sample to generate a model; 
predictions based on this model are then made, for the resubstituted data 
as well as for an independent test sample, assuming that we only observe 
the baseline values of CD4. Consideration is given to two clinically mean- 
ingful CD4 count thresholds: K = 200 and K = 350 cells/mm 3 . All analyses 
are performed using R version 2.7.1. The median length of follow-up is 25 
months and the interquartile range (IQR) for length of follow-up is (14,32) 
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months. The median number of follow-up time points is 9 with a full range 
of 2 to 24. In total, there are 2635 records including baseline measurements. 
The median baseline CD4 count for this cohort is 219.5 with an IQR equal 
to (114,333). 

Linear and generalized linear mixed effects models are fitted in R using 
the lme() and lmerQ functions of the nlme and lme4 packages, respectively. 
We assume a piecewise linear mixed effects model for modeling CD4 count 
after initiation of ART [Fitzmaurice, Laird and Ware (2004)]. This model 
is appropriate since CD4 count tends to rise rapidly for approximately one 
month and then proceeds to increase more gradually. Fixed effects for base- 
line CD4 count (on a log base 10 scale), baseline and time varying values of 
WBC and lymphocyte percentage and time before and after one month of 
follow-up are included in the model as predictors. In addition, interactions 
between each time component and baseline values of WBC and lymphocyte 
percent are included. 

The design matrix Xj for the fixed effects of equation (2.1) is thus given 

by 



(3.1) 
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£2 
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where Wio and ko are respectively baseline WBC and baseline lymphocyte 
percent, tij is time in months since initiation of ART, (tij — 1)+ is follow-up 



time after the first 1 month on ART for tij > 1 and otherwise, 



and Wij and 



kj are respectively WBC and lymphocyte percent at time iy . We define t/io 
in Xji as log(CD4) for both the linear and generalized linear model although 
the response variable, given by Y, = (Yn, . . . ,Yi ni ), is dichotomized for the 
generalized linear model setting. Notably, this model allows for two linear 
time trends, before and after 1 month of follow-up on ART. Random person 
specific intercepts and slopes before the knot are also assumed so that the 
design matrix Zj for the random effects of equation (2.1) is given by 



(3.2) 
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The random effects vector in equation (2.1) is given by bj = [&io hi] rep- 
resenting the intercept and slope before the change point for individual i. 

We begin by fitting the generalized linear model, as described in equa- 
tion (2.2). In this case, post-baseline CD4 counts are dichotomized and used 
as the outcome in the model fitting procedure. Predicted probabilities of 
being above the CD4 threshold are estimated for each post-baseline time 
point for each individual. The results of applying a probability cutoff of 0.50 
are given in Table 2(a). We call this the "naive" approach since the cutoff 
does not incorporate information about the resulting prediction rule. While 
the sensitivities of these predictions rules (0.98 and 0.90) are high for both 
thresholds, the corresponding false positive rates are also high (0.54 and 
0.28). This approach thus may not be appropriate for CD4 testing since it 
yields a high probability of falsely predicting that an individual's CD4 count 
is within a safe limit. 

Next, several a cutoffs are considered to generate multiple prediction 
rules and an ROC curve is generated, as illustrated in Figure 1(a). This is 
again based on the GLMM approach to model fitting. Data corresponding to 
rules with resubstitution FP rates of approximately (but not greater than) 
5% and 10% and CD4 threshold cutoffs of K = 200 and 350 are provided 
in Tables 2(b) and (c). Resubstitution-based summary measures are given 
in Table 3(a). Based on a CD4 threshold of K = 200, a FP rate of 0.09 
corresponds to a sensitivity of 0.61, a positive predictive value of 0.97 and a 
negative predictive value of 0.32. For the same CD4 threshold, a FP of 0.05 
corresponds to a sensitivity of 0.42, a positive predictive value of 0.98 and 
a negative predictive value of 0.25. 

Table 2 

Observed and predicted counts (based on learning sample data) 
(a) GLMM approach with a "naive" 0.50 probability cutoff 



Observed 







>200 


<200 


Total 


Predicted 


>200 


1932 


215 


2147 




<200 


34 


184 


218 




Total 


1966 


399 


2365 



Observed 







>350 


<350 


Total 


Predicted 


>350 


1194 


289 


1483 




<350 


137 


745 


882 




Total 


1331 


1034 


2365 
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Table 2 
Continued 

(b) GLMM approach 

Observed Observed 



>200 <200 Total >200 <200 Total 



Predicted* 


>200 


826 


18 


844 


1206 


37 


1243 




<200 


1140 


381 


1521 


760 


362 


1122 




Total 


1966 


399 


2365 


1966 


399 


2365 








Observed 




Observed 








>350 


<350 


Total 


>350 


<350 


Total 


Predicted* 


>350 


669 


50 


719 


880 


103 


983 




<350 


662 


984 


1646 


451 


931 


1382 




Total 


1331 


1034 


2365 


1331 


1034 


2365 






(c) 


LMM approach 












Observed 




Observed 








>200 


<200 


Total 


>200 


<200 


Total 


Predicted* 


>200 


1291 


19 


1319 


1558 


38 


1596 




<200 


675 


380 


1055 


408 


361 


769 




Total 


1966 


399 


2365 


1966 


399 


2365 








Observed 




Observed 








>350 


<350 


Total 


>350 


<350 


Total 


Predicted* 


>350 


760 


51 


811 


940 


103 


1043 




<350 


571 


983 


1554 


391 


931 


1322 




Total 


1331 


1034 


2365 


1331 


1034 


2365 



* Predicted counts are based on rules with resubstitution FP rate estimates of approxi- 
mately (but not greater than) 5% (left panels) and 10% (right panels). 

Next we fitted the linear mixed effects model, as described by equa- 
tion (2.3), to the observed CD4 count data. The resulting ROC curve illus- 
trating the sensitivity and corresponding false positive rates in this cohort 
(resubstitution estimates) is given in Figure 1(b). Count data corresponding 
to rules for which thresholds are K = 200 and 350 and the resubstitution 
FP rates are approximately (but not greater than) 5% and 10% are given in 
Table 2(b). Corresponding summaries, as well as 95% bootstrap confidence 
intervals (CIs), are reported in Table 3(b). To arrive at CIs, we repeatedly 
sample individuals with replacement and in each case, fit a linear mixed ef- 
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(b) 

Fig. 1. ROC curves based on resubstitution estimates, (a) GLMM, (b) LMM. 

fects model. The prediction rule corresponding to FP rates of approximately 
(but not greater than) 5% and 10% are selected and corresponding resubsti- 
tution estimates of sensitivity, PPV and NPV are recorded. A total of 100 
bootstraps are performed for each threshold and the fifth and ninety-fifth 
percentiles reported. 

Based on a CD4 cutoff of 200, a FP rate of 0.10 corresponds to a sensitivity 
of 0.79 [95% CI (0.74, 0.83)]. In this case, the PPV is 0.98 (0.97, 0.98) and 
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Table 3 

Estimates of predictive performance 
(a) GLMM approach 



GLMM (LS) GLMM (TS) 





Sens 


Spec 


PPV NPV Sens 


Spec 


PPV 


NPV 


K = 200: 
















LS FP < 0.05 


0.42 


0.95 


0.98 0.25 0.66 


0.96 


0.99 


0.31 


LS FP<0.10 


0.61 


0.91 


0.97 0.32 0.77 


0.96 


0.99 


0.39 


# = 350: 
















LS FP < 0.05 


0.50 


0.95 


0.93 0.60 0.61 


0.95 


0.95 


0.59 


LS FP<0.10 


0.66 


0.90 


0.90 0.67 0.79 


0.90 


0.93 


0.71 


(b) LMM approach 






LMM (LS) 






LMM (TS) 






Sens 


Spec 


PPV 


NPV 


Sens 


Spec PPV 


NPV 


K = 200: 
















LS FP < 0.05 


0.66 


0.95 


0.99 


0.36 


0.77 


0.96 0.99 


0.39 




(0.60, 0.75) 




(0.98, 0.99) 


(0.31, 0.46) 








LS FP<0.10 


0.79 


0.90 


0.98 


0.47 


0.84 


0.96 0.99 


0.49 




(0.74, 0.83) 




(0.97, 0.98) 


(0.37, 0.54) 








# = 350: 
















LS FP < 0.05 


0.57 


0.95 


0.94 


0.63 


0.73 


0.93 0.95 


0.67 




(0.44, 0.67) 




(0.92, 0.95) 


(0.56, 0.70) 








LS FP < 0.10 


0.71 


0.90 


0.90 


0.70 


0.84 


0.90 0.94 


0.77 




(0.65, 0.79) 




(0.88, 0.92) 


(0.66, 0.78) 









the NPV is 0.47 (0.37, 0.54). This corresponds to the rule in which a = 
0.035. That is, an individual's CD4 count is predicted to be above 200 if the 
probability that this measurement is greater than 200 is at least 1 — 0.035 = 
96.5%. For the same CD4 threshold, a FP rate of 0.05 corresponds to a 
sensitivity of 0.66 (0.60, 0.75), PPV of 0.99 (0.98, 0.99) and NPV of 0.36 
(0.31, 0.46). 

In order to further evaluate model performance, we apply our prediction 
rule to 399 observations across n = 72 individuals from an independent co- 
hort in Philadelphia. We use only baseline CD4 counts to make predictions, 
assuming that this is all that is available. The median baseline CD4 in this 
cohort is 260.5 cells/mm 3 and the IQR is (159.0,354.2). Test sample es- 
timates for sensitivity, false positive rate, PPV and NPV are provided in 
Tables 3(a) and (b) for each of the prediction rules. A tabular summary of 
counts for one rule based on the LMM approach is given in Table 4. The 
total count is n = 327 since there are 399 — 72 = 327 post-baseline measure- 
ments for this cohort. In this case, n = 240 measurements are predicted to 
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Table 4 

Observed and predicted counts (based on test sample 
data) 



Observed 







>200 


<200 


Total 


Predicted 


>200 


238 


2 


240 




<200 


44 


43 


87 




Total 


282 


45 


327 



be above the threshold, while 87 are predicted below. Since this is intended 
as a prioritization tool, this rule would suggest performing a true CD4 test 
on the 87 observations that are predicted below the threshold to confirm 
the true value. A "savings" associated with this rule is 240/327 = 73% since 
a CD4 test would not be required for this percentage of the observations. 
The "cost" is the associated false positive rate of 2/45 = 4.4%. Interestingly, 
the test sample estimates based on the LMM approach [Table 3(b)] appear 
slightly better than the resubstitution estimates. In fact, in some cases, these 
test sample estimates are greater than the 95% bootstrap confidence limits 
derived based on the learning sample. This result may be a consequence of 
the overall slightly higher baseline CD4 count in the Philadelphia (test sam- 
ple) cohort. A discussion of the potential utility of stratified analysis (e.g., 
according to baseline CD4 counts) is provided in Section 5. 

4. Extensions. In this section we briefly describe two extensions of the 
method outlined in Section 2 to illustrate its flexibility and directions for 
further development. First, we consider one approach to incorporating in- 
formation about the individual level random effects into our prediction al- 
gorithm for the linear mixed effects setting. This approach is relevant as 
it provides a potential framework for incorporating observed, post-baseline 
CD4 counts into the model. Additionally, it illustrates the trade-off between 
using baseline data within the fixed effects design matrix, and using these 
data to inform prediction of the random effects. Second, we detail how this 
method can be applied to making predictions about changes in CD4 count 
over time. Extensions for modeling alternative outcomes are relevant, as clin- 
ical decision making generally takes into account both absolute and relative 
CD4 count changes. 

4.1. Using observed response data to inform BLUPs of random effects. 
While leading to a prediction rule with good predictive performance, the 
approach described in Section 2 does not take into account the latent effects 
that result in some individuals having higher or lower responses, information 
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that is typically captured in random effects. Several alternatives exist. For 
example, the prediction variance used in the example above is based on the 
usual regression setting, Var(j/jj — y^-) = Var(xjj/3 — Xij/3 — e). Alternatively, 

we could use Var(yij - yij) = Var(xjj/3 - Xij/3 - Zijbi - s) = Xjj Var(/J)x^ + 

ZjjDz^ + a 2 . That is, while we let bi = 0, we still include the true b{ in the 
prediction variance formula. Based on the London data, this results in slight, 
yet unremarkable improvements in sensitivity (results not shown). 

We can also estimate the random effects for new individuals based on 
baseline data. In the example provided, we assume only baseline CD4 counts 
are available, and these are used in the fixed effects design matrix rather 
than informing the random effects. To begin, we propose fitting the model 
of equation (2.3) with the slight modification that the observed baseline 
CD4 count, given by mo, is now included in the response vector Yj and 
removed from the design matrix Xj . In order to estimate the random effects 
for a new individual (whose complete response vector yj is unobserved), 
we calculate the conditional expectation of the random effects, given the 
baseline (observed) response yiQ. That is, we replace hi = E(pi\yi) with bi = 
E(bi\mo) = (Uio -Xio3o)/(-Di,i +of)-Di,-, where D^i is the (1,1) element of 
D corresponding to the estimated variance of the intercept random effect, 
D\ . is the column vector corresponding to the first column of D, /3q is the 
first element of /3 corresponding to the intercept fixed effect and Xjo is the 
first row of Xj. This equation is derived simply by replacing the matrix 
Zj with its first row and replacing the vectors yj and Xj/3 with their first 
elements in the formula 6, = E{bi\yi) = DZ?"£~ (yj — Xj/3). 

Notably, this is not the same prediction of bi that would have been arrived 
at if the complete data vector yj were observed and so the alternative no- 
tation bi is used. Through use of the first column of the D matrix, we draw 
on the estimated covariance between the random effects to fill in values for 
both the intercept and slope random effects for each individual, while only 
relying on baseline values of the response. Finally, we additionally replace 
Var(6j — bi) and Cov(/3,6j — bi) with Var(6j — bi) and Cov(/3,6j — bi), respec- 
tively in the formula for Var(yjj — yij). Applications of this approach to the 
London data (results not shown) are similar to those reported, suggesting 
that, in this data example, using the modified BLUPs in place of treating 
baseline as CD4 as a predictor variable does not improve our prediction al- 
gorithm. Observed post-baseline measures of CD4 that occur prior to the 
time of prediction could be incorporated similarly into the predicted random 
effects. 

4.2. Making predictions about the percentage change in CD4 count over 
time. In Sections 2 and 3 we focus on the setting in which interests lie in 
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predicting the response at a single time point. More generally, we may want 
to make a prediction about a function of the CD4 counts for individual i 
across a combination of time points j. For example, we may be interested 
in the percentage change in CD4 count over a specified period of time, 
given by the function ft(Yij) = {Yij — where (j — j') = t. We can 

again begin by fitting the linear model of equation (2.3) to the repeated 
CD4 count measures and arriving at predictions for new observations based 
on this model. The predicted percentage change for a single individual i is 
then given by ft(Vij) = (yij — yij') /yij- In order to determine the prediction 
variance of ft(yij), we use the multivariate delta method. Based on a first- 
order Taylor series expansion, we have Var[/ t (yy)] = Var[(yjj — yij') /yij] = 
Vav[yij' /yij] « U T VU, where U T = (l/yij —yij' /yij) is the score vector 



calculated using the same formula as for \ax(xjij) above, where the vectors 
Xjj and Zij are replaced by matrices with rows corresponding to the time 
points j and j' . Further exploration of the utility of fitting a LMM and 
identifying an associated prediction rule for the percentage change in CD4 
count, or a rule that evaluates simultaneously the absolute level and the 
percentage change within the PBC framework, is ongoing research. 

5. Discussion. This manuscript presents an analytic approach, which we 
term PBC, for predicting a quantitative biomarker trajectory over time that 
combines the generalized linear mixed effects model with an ROC curve 
type approach. Two approaches to approximating the prediction rule of 
equation (2.5) are considered. In the first case, we dichotomize the data a 
priori and model the resulting binary outcomes over time; a generalized lin- 
ear mixed effects modeling approach is applied for direct estimation of 6ij. 
Since we ultimately aim to arrive at a binary prediction rule, this approach 
is intuitively appealing and consistent with applications of the logistic model 
for prediction. In the second case, we model the data using a linear mixed 
effects model, a standard approach to the analysis of unevenly spaced, re- 
peated measures data with a continuous response and multiple predictor 
variables. The results of this model fitting procedure are used in turn to 
inform predictions, in this case using a rule that involves the lower bound 
of the corresponding prediction interval. This second approach also offers 
intuitive appeal since it allows for use of all of the observed data to inform 
the model fit. A similar approach as the one described herein can be applied 
for modeling pathogenesis, though the additional population level variabil- 
ity in CD4 counts in the absence of therapy may lead to lower predictive 
performance. 

PBC differs in two regards from methods currently employed in this set- 
ting. First, we apply first-stage modeling that can incorporate the full range 
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of multiple continuous and categorical predictors, as well as quantitative 
data on our outcome (CD4 count) to inform our analysis. Estimated mean 
and variance components from this model fitting procedure are subsequently 
used to define a rule for predicting whether a function of the observed CD4 
count (within and across time points) is above or below a clinically mean- 
ingful threshold. Multiple patient level characteristics can be incorporated, 
including observed baseline CD4 count and time- varying values of the poten- 
tially predictive markers as described in Section 3. The proposed approach 
is different from previously described approaches for this setting since mod- 
eling is performed using all of the available data and a prediction rule is 
associated with the resulting model. One potential advantage is that we are 
able to draw on the full range of both the predictor and outcome data to in- 
form our investigation while still providing a binary decision rule for clinical 
decision making based on resulting probability estimates. 

A second difference is that PBC provides a framework for modeling CD4 
count trajectories over time that is not limited to characterizing changes be- 
tween two time points. Specifically, we consider models with a single knot at 
one month after initiation of ART to account for the rapid increase in CD4 
count that is typically observed and the subsequently slower rise over time 
[Laird and Ware (1982); Fitzmaurice, Laird and Ware (2004)]. The GLMM 
is applied with individual level random intercept and slope terms in order 
to account for the within person correlation inherent in repeated measures 
data. The use of a mixed effects model for longitudinal CD4 data has been 
described for monitoring response to therapy [Mahajan et al. (2004)]; how- 
ever, the aim of that investigation differed in that the investigators applied 
the mixed model to uncover the within and between person variability in 
TLC for fixed changes in CD4 count. In our setting, the mixed model is used 
as a tool within a predictive algorithm that allows for prediction across a 
temporal trajectory. 

Several manuscripts also report receiver operating characteristic (ROC) 
curve analyses using information on TLC as well as other markers, such 
as hemoglobin to predict CD4 count. To our knowledge, all such investiga- 
tions involve a first-stage dichotomization of the proposed markers as well 
as the outcome CD4 count. For example, Spacek et al. describe an approach 
involving cutoff points for TLC (<1200 cells/mm 3 and >2000 cells/mm 3 ) 
and/or hemoglobin (>12 g/dl) [Spacek et al. (2003)], while others propose 
dichotomizing TLC based on whether the change over a specified time pe- 
riod is greater than [Badri and Wood (2003); Mahajan et al. (2004)]. CD4 
count is also dichotomized (<200 cells/mm 2 ) for each observation based 
on the absolute value at a given time point or the change over a specified 
period. These investigations generally include reporting of sensitivity, speci- 
ficity, positive predictive value (PPV) and negative predictive value (NPV), 
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where sensitivity and specificity are denned in the usual manner as the pro- 
portions respectively of those predicted positive among those truly positive 
and those predicted negative among those truly negative. Through consid- 
eration of multiple cutoff points for both predictor and outcomes, ROC 
curves are generated that illustrate the trade-off between sensitivity and 
specificity. 

Logistic regression models have also been described as a useful tool in 
this setting [Bagchi et al. (2007); Spacek et al. (2003)]. These methods draw 
strength on the continuous nature of the potentially predictive markers, such 
as TLC, while using a dichotomized version of CD4 count. Logistic models 
have the advantage of offering a framework for incorporating multiple con- 
tinuous or categorical predictor variables and accounting for the confound- 
ing and/or effect modifying role of patient specific demographic and clinical 
factors. Adjusted odds ratios are reported from these model fits. While this 
approach uses more information on the available data, it involves first di- 
chotomizing CD4 counts and does not include reporting of sensitivity and 
specificity, two clinically appealing and relevant concepts. 

An extensive literature also exists on methodologies for ROC curves as 
summarized in Zhou, Obuchowski and McClish (2002) and Pepe (2000b). 
Within this body of research, methods for incorporating ordinal and contin- 
uous predictors have been described [Pepe (1998, 2005); Tosteson and Begg 
(1988)] as well as approaches to handling repeated marker data [Emir et al. 
(1998)]. To our knowledge, however, these methods are developed primar- 
ily for a dichotomous outcome such as "diseased" or "not diseased." In our 
setting, both the predictor variables and outcome of interest are continu- 
ous biomarkers, which serve as a primary motivation for the linear mixed 
effects modeling approach we describe. Specifically, we aim to incorporate 
and draw strength from the complete observed response data (rather than 
a dichotomized version) to arrive at a prediction rule. 

Similar to our approach, methods for time-dependent ROC curves, as 
described in Heagerty, Lumley and Pepe (2000), aim to characterize a time- 
varying clinical measure of disease progression within a prediction frame- 
work. Heagerty, Lumley and Pepe (2000) provide an eloquent approach for 
the setting of a survival outcome, in which the binary indicator for disease 
status is potentially censored and can vary over time, and which involves 
direct modeling of the sensitivity and specificity. In our setting, the outcome 
of interest is a continuous biomarker and, thus, direct modeling of the sen- 
sitivity and specificity in this fashion is not tenable. Instead, we consider 
two approaches, one that involves direct modeling of the probability that 
the outcome is above a threshold and the second that approximates the 
prediction rule through use of a corresponding prediction interval. Further 
extensions involving modeling of time to CD4 count below a meaningful 
threshold would be interesting. 
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Methods involving generalized linear models and mixed effects models 
have been described for estimating ROC curves [Albert (2007); Gatsonis 
(1995); Pepe (2000a)]. As noted by Dodd and Pepe (2003), PBC in its orig- 
inal formulation is an approach to estimation of the area under the ROC 
curve given by the probability that the response in the group is greater than 
the response is another group. The setting described herein differs, however, 
since here estimation is described for the probability that an observation is 
greater than a given threshold and not for the comparison of two groups. A 
ROC curve is then generated based on a prediction rule that incorporates 
this estimated probability. Finally, we note that our algorithm involves gen- 
erating a single ROC curve based on a set of predictors determined in a 
model fitting framework. This distinguishes our strategy from approaches 
that aim to identify the most predictive set of markers by evaluating the 
areas under the curve across several sets of predictors, such as Bisson et al. 
(2008). 

PBC may be a clinically useful tool for predicting whether an individual's 
CD4 count will be greater than a given threshold based on less-expensive 
laboratory measures, including WBC and lymphocyte percent. For the data 
example presented, using the continuous range of the CD4 data and appli- 
cation of the linear mixed effects model appears to offer better predictive 
performance than a first stage dichotomization and application of the gener- 
alized linear mixed model. This is evidenced in both the resubstitution and 
test sample estimates of predictive performance. For example, for a CD4 
threshold of 350 and a test sample FP rate of 4%, the GLMM approach 
results in test sample Sensitivity = 0.50, PPV = 0.78 and NPV = 0.88. The 
LMM approach, on the other hand, yields test sample Sensitivity = 0.64, 
PPV = 0.82 and NPV = 0.91 for the same cutoff and test sample FP rate. 
While we have not demonstrated a statistically significant difference between 
the two approaches, a clear trend is observed across all rules for both the 
test and learning sample data. 

The primary advantages of this strategy over the tools described in Sec- 
tion 1 for this data setting are as follows: (1) it allows us to draw strength 
from the full range of continuous outcome data (through linear modeling) 
while providing us with clinically relevant measures, such as positive pre- 
dictive value (through subsequent classification based on probability thresh- 
olds) and (2) it allows for simultaneous consideration of unevenly spaced 
biomarker measurements over time. In the example described for predict- 
ing absolute CD4 count based on a 200-level threshold, a positive predic- 
tive value of 0.98 is observed with a false positive rate of 0.05, suggesting 
this approach may be useful in developing alternative clinical management 
strategies. The relatively low NPV of 0.36 suggests that the approach de- 
scribed herein may serve best as a prioritization tool that allows for the re- 
duction in higher-end capacity testing, while not replacing the use of these 
tests. 
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The clinical utility of this tool, however, will require further consideration 
of additional clinical and environmental factors as well as an in-depth anal- 
ysis of a diverse array of cohorts. For example, the application presented 
in Section 3 is based on data from the London cohort in which a median 
baseline CD4 count of 219.5 is observed. Baseline CD4 counts at initiation of 
therapy tend to be lower in resource poor settings since treatment guidelines 
in these settings impose a lower threshold for starting ARTs. The implica- 
tion of differing patient level characteristics such as baseline CD4 count on 
the appropriateness of this approach as a diagnostic tool still requires thor- 
ough assessment. Stratified analyses may also be informative in identifying 
subgroups for which the tool is best suited. For example, characterizing the 
relative performance among viremic and nonviremic patients, or during ear- 
lier and later exposure to ARTs, will provide additional insight into the 
large-scale relevance of this approach. In addition, the example presents a 
prediction for each observation within an individual. Characterizing this ap- 
proach for predicting that any of an array of observations for an individual 
will be above the threshold would provide further insight into its utility. Fi- 
nally, it may be useful to additionally incorporate the acquired CD4 counts 
of those individuals who are tested because they are predicted to be below 
the threshold. We are currently investigating these alternative questions and 
settings. 

The PBC approach we describe relies heavily on observing baseline CD4 
counts. We are currently exploring application of this approach to data aris- 
ing from the Women's Interagency HIV Study (WIHS) and the Multicenter 
AIDS Cohort study (MACS) cohorts in which dates of initiation of therapy 
are observed only within a six month window. This presents an additional 
challenge since our model includes a rapid rise in CD4 counts over the first 
one month of therapy followed by a slower sustained increase. Thus, in its 
current formulation, the precise time of ART initiation is crucial. Further 
extensions may provide tools necessary for these alternative settings; how- 
ever, collection of baseline CD4 count data at initiation of therapy for HIV 
is routine in most settings and, thus, this does not diminish the potential 
relevance of PBC for this application. 

We also note that the proposed PBC framework is not limited to the choice 
of design matrices given in Section 3. Incorporation of additional potentially 
clinically relevant variables such as sex and weight in the model fitting stage 
is straightforward. As the model fit improves and the prediction variance 
decreases, the value of a in equation (2.5) corresponding to the best predic- 
tion rule will likely change. In the extreme case that the prediction variance 
tends to 0, we have that lij i0l of equation (2.7) approaches yij regardless of 
a. In this case, since the observed and predicted values would be very close, 
all prediction rules would perform equally well with sensitivity and speci- 
ficity close to unity. In addition, alternative more sophisticated models may 
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offer improved accuracy. For example, Chu et al. (2005) describe a Bayesian 
random change point model for predicting CD4 trajectories that includes 
both population and individual level change points. Incorporating this mod- 
eling approach into the PBC framework introduces the additional analytic 
challenge of predicting individual-level change points for new patients and 
is a direction of potential future development. 

In summary, through combining modeling and an ROC curve approach, 
PBC provides a flexible statistical framework for appropriately modeling 
continuous biomarker data using all available data on the biomarker as well 
as additional, potentially relevant continuous or categorical predictors. At 
the same time, it offers interpretable measures of diagnostic accuracy based 
on clinically determined thresholds. Notably, improved prediction of CD4 
count based on less-expensive and more widely available laboratory mea- 
sures, such as lymphocyte percentage and white blood cell count, may have 
broad public health implications. A sound diagnostic tool could provide for 
more targeted CD4 testing strategies, offering a much needed instrument in 
a resource limited setting where HIV/AIDS presents the greatest burden. 
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